# -*- coding: utf-8 -*-
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
<title>Wikipedia Revision Analysis</title>
<link rel="StyleSheet" href="/main.css" type="text/css" media="screen" />
<link rel="StyleSheet" href="/shared.css" type="text/css" media="screen" />
</head>

<body class="mediawiki ns-0 ltr">
<div id="globalWrapper"><div id="column-content">

<!-- Title -->
<div id="content" >
<h1 class="firstHeading">Wikipedia Revision Analysis</h1>
<div id="contentSub"></div>

<!-- Main content -->
<div id="bodyContent">
<p>A visual evaluation of <a href="http://wikipedia.org">Wikipedia</a> articles at the sentence level.</p>
<p>Articles are analyzed with the following schemes and a sandbox (edit preview) is provided to test new revisions:</p>

<ul>
<li><b>Persistant / New</b> - highlights <span style='background-color:#CCFFFF;'>light blue</span> for older content, and <span style='background-color:#FFFFCC;'>yellow</span> for newer content.  
<br/>The longer text has been around the more likely that it is accurate (as it has been passed over by many Wikipedian's eyes).</li>
<li><b>Consistency / Controversy</b> - highlights <span style='background-color:#DDEEFF;'>blue</span> for consistent text, and <span style='background-color:#FFEEDD;'>orange</span> for heavily modified text. 
<br/>This is useful for determining areas under debate. </li>
<li><b>Added / Removed</b> - highlights <span style='background-color:#DDFFDD;'>green</span> for text that has been added and <span style='background-color:#FFDDDD;'>red</span> for that has been removed by previous revisions. 
<br/>The purpose for this is to detect vandalism and removed content -- the current article will be mostly green.</li>
</ul>

<h2>Wikipedia Integration</h2>
<p>A <a href="http://www.greasespot.net/">Greasemonkey</a> (a <a href="http://www.mozilla.com/firefox/">Firefox</a> extension) script to integrate into Wikipedia is located <a href="/wikianalysis.user.js">here</a>.  For available articles a box will appear on the top right of the page to select analysis type.</p>

<h2>Article Search</h2>
<p>This project is currently in progress and only a small (fluctuating) subset of the Wikipedia corpus is available.</p>

<!-- Search form -->
<div class="portlet" id="p-search" style="text-align: center; vertical-align: middle; margin: 0 auto 0 auto; padding: 0.1em; width: 540px; max-width: 95%;">
<form action="/search" id="searchform">
<fieldset style="border: 1px solid #aaaaaa; background-color: #f9f9f9; padding: 0.7em; width: auto; margin-top: 0.5em;">
Search for title: 
<input type="text" name="term" size="20" id="searchInput" style="vertical-align: top; padding: 0; margin: 0; font-size: 1.2em;" value="${c.term}" />
<input type="submit"  value="  ›  " style="vertical-align: top; padding: 0; margin: 0; font-size: 120%;" />
</fieldset>
</form>
</div>
<!-- Focus -->
<script type="text/javascript">
 document.forms[0].term.focus();
</script>

<h2>Random Articles</h2>
${c.body}

<h2>Source Code</h2>
<p>Results are obtained by aggregating edit history with the <a href="http://crm114.sourceforge.net/">CRM144</a> discriminator, which then classifies the current state of the article (as of the latest Wikipedia dump).</p>
<p>The source code is hosted at Google Code located <a href="http://code.google.com/p/wikipedia-controversy/">here</a>.  Note that it is written to plug into <a href="http://hadoop.apache.org/core/">Hadoop</a> (MapReduce) with the goal of running over the entire Wikipedia corpus.</p>
</div></div>

<!-- Tabs -->
<div id="p-cactions" class="portlet">
  <h5>Views</h5>
  <div class="pBody">
    <ul>
      <li class="selected"><a href="/">home</a></li>
      <li id="ca-talk"><a href="/search">search</a></li>
    </ul>
  </div>
</div>

</div></div>

<!-- Footer -->
<div class="visualClear"></div><div id="footer">
	developed by <a href="mailto:marlink@gmail.com">Mark Perry</a>, 
	made a reality by collaboration with Joseph Poon, 
	released under <a href="http://www.fsf.org/licensing/licenses/info/GPLv2.html">GPLv2</a><br/>
	site powered by <a href="http://pylonshq.com/">pylons</a>,
	corpus and css from <a href="http://wikipedia.org">Wikipedia</a>
</div>

</body></html>
